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Although the evaluation of innovations in teacher training is often dependent on 
observational data, the problem of the reliability of the observations collected by a 
team of observers has often been treated in a superficial manner, partially because 
of the difficulties of maintaining an observer team intact over an extended period of 
time and of being permitted to observe each teacher a number of times. An analysis 
of variance model has been developed which permits the calculation of an overall 
reliability coefficient and the partitioning of the sources of variation for the typical 
observer team situation in which the team visits a number of different teachers only 
once and where the team does not necessarily contain the same members for aH 
visits. The paradigm is developed for the situations in which there are n 
observations per item per observer as well as when there is only one observation per 
item per observer. The model has been tested using data based on the School 
University Teacher Education Center (SUTEC) observation schedule which is designed 
to investigate seven aspects of classroom behavior. Since the proposed mode, 
permitted the partitioning of the variance associated with the component parts ot the 
schedule, it may provide useful , as a test of the homogeneity of the items in an 
observation schedule as well as for reliability calculations. (Author/JS) 
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Reliability of an Observation of Teachers' Classroom Behavior 

Theodore Abramson 



During the last 15 years a number of observational schedules of 
teachers* classroom behavior have been formulated (Medley.& Mitzel, 
1963), 'Jhe reliability of the observer teams using these instruments 
has often been dealt with superficially. 

Recently, Denny (1968) used an analysis of variance (ANOVA) 
technique to calculate the reliability of an observation schedule. 

The. ANOVA technique used was essentially that first proposed by Medley 



and Mitzel (1958, 1963) and permitted the partially out of various 
sources of variance as well as the calculation of a reliability co** 
efficient. The Medley and Mitzel (1963) technique required that the 
• same observers visit the same teachers a number of times. The difficul- 
ties of maintaining a sizable observer team intact for an extended 
period of time and of observing the same teachers a number of times 
make the ANOVA model difficult to apply in many typical situations which 
call for observational data. 

This paper presents an ANOVA model which permits partialling out Ox 
variances and reliability calculations when the observer team does not 
have the same observers throughout and when the observation of each 
teacher occurs only once* It is felt that this model is applicable to 
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many observer team situations, since- -the typical team is trained by ob- 
serving the same phenomena as a group and then comparing observations. 

The model is then applied to the "live" portion of the School University 

Teacher Education Center (SUTEC) observation team data. 

Method 

Since each teacher is observed by an observer team peculiar to him- 
self, the model may be considered a partially hierarchical design. That 
is, each observer team has the same number of observers but not necessar- 
ily the same observers, and therefore the observer team factor is nested 
under the teacher factor. If teachers are factor A, observers factor B, 
and items factor C, B would be nested under A. Assuming that there are 
n scores on each item for each teacher per observer the sources of vari- 
ation, degrees of freedom, and expected mean squares are as given in 
Table I (Winer, 1962) where p, q, and r are the numbers of teachers, ob- 

servers, and items respectively* ^ ^ 

mm mm w «• mm mm mm ** m * 

Insert Table 1 about here 



The D , D , and D p terms are equal to 1-p/P, 1-q/Q> 
respectively, where the p and P, q and Q, and r and R are the sample 
and population parameters of teachers, observers, and items, respectively. 
Each of these D's is either 0 or 1 depending on whether the corresponding 

factor is fixed or random. 

As was pointed out by Medley and MiUel (1963), the assignment of a 
variable as fixed tends to reduce the error of measurement and hence in- 
flate the reliability and therefore the assumption that a variable is fixed 
should be based on sound reasons. A rule of thumb for selecting which 
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factors are fixed and which are random is to decide wlv /her other elements 
comprising the factor might have been used, and if so, then, the factor is 
random (Medley & Mitzel, 1963). For example, if no observers other than 
the ones actually employed could have been used satisfactorily, then the 
observer factor would be fixed. Since there are always other teachers and 
observers available, theoretically anyway, these factors are considered 
random factors. 

More precisely, as p, q, and r, the number of the sample elements, 
approach the values of P, Q, and R the number of elements in the popula- 
tion, the ratios p/P, q/Q, and r/R approach a value of one and therefore 

D , D and D approach zero. If zeros are substituted for the D*s the 
p , q 9 r 

number of factors contained in the expected mean squares shrink and thus 
the reliability is increased because the denominator of the fraction which 
defines the reliability coefficient is decreased. 

The model is also applicable even when there is only one score per 
item per observer for each teacher. In this case the model is the same 
as in Table 1 with n--1 and the within source of variation removed. If 
all factors are random, ones are substituted for the D‘s and the model 



now yields an error term of cl <t (Winer, 1 962) . The remaining 

be e 

expected mean square values follow in a similar fashion. To simplify 
the model still further the Medley and Mitzel (1963) procedure may be 
utilized. According to this procedure, the last term in the source of 



variation column, the residual, is considered to be the error term and 

2 2 -2 

is denoted by(7* rather than^ +V • The simplification of the 
V e be e 

error texm and the substitution of ones for the n and the D*s result 



in the expected mean squares shown in Table 2. 
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Insert Table 2 about here 

The only major difference between the Winer (1962) and Medley and 
Mitzel (1963) approach occurs in the F ratio testing the main effects 
of factor A. This particular F ratio utilizes the nested factor B as 
its denominator, and has a bigger expected mean square term in the sim- 
plified version than is called for by Winer (1962). The difference be- 

J2 

tween the models is due to the C term. This therefore means that a 

be 2 

significant F ratio testing the hypothesis \j =0 in the simplified ver- 

a 

sion would certainly be significant according to Winer (1962). Since the 

2 ^2 

other two F ratios testing the hypotheses \j =0 and \j -0 use the residual 

c ac 

expected mean square as denominators, both the Medley and Mitzel ( 1 963 ) 
and Winer (1962) approaches yield the same F values in these two cases. 

V- 

There are actually two homogeneity assumptions implied by the model. 
The first is that the source of variation due to B(A) represents the 
pooled variation of observers within teachers. The second results from 
the fact that the residual term is actually the B(AyXC interaction term 
and represents the pooling of different sources of variations. The 
homogeneity assumption here is equivalent to the assumption that the 
correlation between items is constant within each of the teachers. 

The model was applied to an observation schedule which was developed 
by a research team at a teacher training institute to investigate certain 
aspects of the classroom behavior of the institutions first year grad- 
uates who were teaching in the New York City public school system. The 



0 



I 



5 

observer team was to observe only the following seven categories of be- 
havior: Teacher mobility, involvement of children, materials present, 

materials in use, directed behavior, spontaneous behavior, and irrelevant 
acts. These items are briefly described below. More detailed descrip- 
tions are available (Chapline, 1968). 

Teacher mobility . The number of different positions occupied by 
the teacher during the second five minutes of each learning activity- 
indicated on a room sketch. 

Involvement of children . A global judgement of the attentiveness 
of the whole class during each learning activity— assessed on a three 
point scale from uninvolved (1) to highly involved (3) • 

Materials present . The number of different materials present 
during the entire observation— checked on&list of materials. 

Materials in use . The number of different materials in use during 
the entire observation— checked on^ilist of materials. 

Directed, behavior. The number of times during each activity that 

/ — 

the teacher called on pupils without the pupils first indicating a 
willingness to respond. 

Spontaneous behavior . The number of times that the pupils in- 
dicated a willingness to respond before being asked to do so plus the 
number of times that the pupils responded spontaneously before permis- 
sion was granted. The score on this category was weighted in a ratio 
of 1:2, respectively, before being added. Raising hand behavior would be 
scored as a one while calling put the answer would be scored as a two. 

If both occurred during the same activity, the activity would be scored 
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as a three provided nothing else happened for the duration of the 
activity. 

Irrelevant acts . The number of behaviors A obviously not related 
to the learning activity of 12 randomly selected children. Each child 
was intensively observed for a two minute period. 

Three teachers were observed once through a one way glass by three 
different observer teams. Each observer team contained seven members, 
but some of the observers were not the same throughout all the obser- 
vations and therefore the teams were considered different. 

In line with the earlier discussion of random and fixed variables^, 
the teacher and observer factors were considered random factors, but 
because the observers were instructed to disregard all behavior other 

than those on the observation schedule the items were considered fixed. 

I x * 2 

Accordingly, theO term in the first and third lines of Table 2 were J 

ac • 

dropped from the expected mean squares for teachers and items, respect- 
ively. The actual and expected mean squares for this specific situation 
in which p=3, q=7,-".and r=7 are given in Table 3. 

Insert Table 3 about here 

mb mm mm mm mm mm mm mm mm mm mm mm mm mm mm mm 

The notations for the observed mean squares used in Table 3 an( i "the 
symbol *» (=*);' (to be read "is estimated by") in Table 4 come from Medley 
and Mitzel (1953). 

, The general set of linear equations which must be solved to find 

the estimated variance components is constructed by setting the esti- ; . 

■ mated mean square terms equal to their corresponding observed mean 
squares. The resulting linear equations are then solved simultaneously. 

» ! 

i 
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Table 4 gives the particular set of linear equations for the specific 
ease listed in Table 3 and the resulting estimated values of the van- 

ances for each factor, _ 

Insert Table 4 about here 



Results 

,, <C 2 =0 fT=0 were all rejected because 

The three hypotheses^ ==0,0 V 

a c ac 

their respective F ratios, 

V m ^a_ . 6.3124, 

a »b<a) 



F e =MS c 



^residual 



= 117.3776, 



F = MS 
ac ac 



« 19.4667, 



^residual 

ang were all significant at the .01 level. The appropriate df -s are given 
in Table 3- The rejection of these three hypotheses indicated that the 
scale does differentiate between teachers and items, and that there is a 
significant interaction between these two non nested factors. 

The overall reliability coefficient (Medley and Mitsel, 1963) 1* 

equal to ^ Heref^ = (qr) f = - 49 2 (.?222) 

•p a a 

C = 1734.0022 

2 2 2 2 -2 
and(T - qr(qr C + r (T + l(7 + v ) 
x a '“(a) ac 

<f X 2 = (7*7) ^7*7> (.7222)+7(.5376)+7(7.6460)+2.89833 

fix- 4682.9937 

Therefore, R xx ° 



Therefore, 
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The .37 reliability coefficient indicated that 37# of the variance 
was attributable to the teacher factor and 63# of the variance was due 
to the items, interaction, and residual factors. An examination of the 
ratio of the variances due to teachers and observers, the factor nested 
under teachers, indicated that 21 .2# and 15. S# of the component of the 
total variance due to teachers was due to teachers and observers, respect 
ively. A similar calculation for the other factors comprising the re- 
maining 63 # of the total variance yielded values of 38 . 0 #, 13.1# and 
6 . 9 # for the items, interaction, and error or residual terms respectively. 

Discussion 

The proposed model did permit the partitioning of the variance 
associated with an observational schedule into its component parts and 
the calculation of an over all reliability coefficient. In the particular 
case to which the model was applied 75 % of the variance was due to 
teachers and items, each of these two factors contributing equally to 
the total variance. Only 15 - 8 % of the total variance was due to obser- 
vers; the factor nested under teachers. These facts permit one to conclude 
that the variance due to different observers being used was considerably 
■wn.r than that due to the different teachers as they were observed on 
the various types of behavior represented by the items of the observational 

schedule. 

That the items accounted for the single largest source of variance 
was probably due to the very different elements of behavior being observed. 
For example, materials present required very little judgement on the part 
of the observer, while involvement of children required a great aeal of 
judgement. Indeed, one of the proposed future use^>f the paradigm pre- 
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sented is /a tes;b..of the homogeneity of the items 



in observational schedules 



and therefore the model may be found useful “when applied to data based on 
other observational instruments '$&***' ^ 
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TABLE I 



Sources of Variation, Degrees of Freedom, and Expected 

Mean Squares for an ANOVA Design with Factor B 
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Nested Under Factor A 
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TABLE 2 

ANOVA Design with Factor B Nested Under 
Factor A, All Factors Random, and n = 1 



Source of Variation 
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TABLE 3 ' 

Analysis of Variance of an Observation Schedule Containing 
Seven Items and Using Three Observer Teams and 

Three Teachers 



Bounce of Variation _ . 
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TABLE 4 



jEstimation of Variance Components for an Observation Schedule j 
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Containing Seven Items and Using Three Observer Teams j 
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and Three Teachers j 
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